OVERVIEW

The code in this replication package conducts the analysis for the article "News Sharing on Social Media: Mapping the Ideology of News Media Content, Politicians, and the Mass Public". All code is in R. Four files run all of the code necessary to generate the figures and tables in the main article and appendix. Running the statistical models from scratch will take 3+ weeks on a standard 6-core laptop, and 5-7 days on a 48+ core high-performance computing environment. As a result, pre-run models are provided as part of the replication archive. Data for the analysis that relies on survey-linked social media data (a small part of the full analysis) cannot be provided for confidentially reasons, as noted further below.


DATA AVAILABILITY AND PROVENANCE STATEMENTS

The raw data for the article come from the social media posts of politicians, randomly selected ordinary users, and those from YouGov survey respondents who consented to their survey data being linked to their Twitter feeds. For privacy/confidentiality reasons, survey-linked social media data are not provided in the replication archive. Although the code that uses these data cannot be run, the code itself is nevertheless provided in the replication archive. Furthermore, consistent with EU GDPR requirements, no raw individual-level Twitter data (i.e. individual tweets) are provided in the archive. The aggregated data that the models rely on are all provided, however. Fortunately, these aggregated data are those that are required to run the models and produce the figures/tables in the article (aside from one figure that uses the survey-linked social media data from YouGov).


STATEMENT ABOUT RIGHTS

I certify that the author(s) of the manuscript have legitimate access to and permission to use the data used in this manuscript.


SUMMARY OF AVAILABILITY

Some data cannot be made publicly available, as detailed below.


DETAILS ON EACH DATA SOURCE

The primary data in the article are the tweets from US political actors,
ordinary US Twitter users, and survey respondents (from YouGov) who consented to have their Twitter timelines linked to their survey responses.

Almost all data necessary for replication of the descriptive statistics, figures, and tables are provided in the replication archive. Due to the EU's GDPR privacy restrictions, the data that are not provided are the raw Tweet data for politicians, users, and YouGov respondents. Because the model and analysis introduced in the article rely only on aggregates of these data, all tables and all but 1 figure (Appendix Figure A1) can be reproduced with the replication data and code. A small number of descriptive statistics (those using the survey-linked social media data from YouGov) also cannot be replicated without the raw data.

The data that are available are detailed in the DATASET LIST below.



DATASET LIST

The following data and pre-fitted models are provided in the replication archive:

Ideology scores of politicians (Barberá, 2015):
./account_data/Barbera2015_Politicians.csv

Ideology scores of ordinary users (Barberá, 2015):
./account_data/Barbera2015_Users.csv

Descriptive data for US political actors and their Twitter IDs:
./account_data/Politicians.csv

Hashed and salted IDs of ordinary users' Twitter IDs (for merging):
./account_data/Users.csv

Matrix of the counts of tweets of news stories by political actors and ordinary users:
./count_matrix/Y_Users_Politicians.rds

Matrix of the counts of tweets of news stories by political actors:
(by each single year of data)
./count_matrix/Y_Politicians_2017.rds
./count_matrix/Y_Politicians_2018.rds
./count_matrix/Y_Politicians_2019.rds
./count_matrix/Y_Politicians_2020.rds

Matrix of the counts of tweets of news stories by political actors:
./count_matrix/Y_Politicians.rds

Pre-fitted models:
./fitted_models/model_users.RDS
./fitted_models/model_users_no_prior.RDS
./fitted_models/model_politicians.RDS
./fitted_models/model_politicians_common_prior.RDS
./fitted_models/model_politicians_2020.RDS
./fitted_models/model_politicians_2019.RDS
./fitted_models/model_politicians_2018.RDS
./fitted_models/model_politicians_2017.RDS

The number of URLs tweeted by each political actor and user:
./tweet_urls/URLs_per_tweet.csv

Descriptive of the numbers of URLs tweeted by al political actors and users:
./tweet_urls/URL_descriptives.csv

The outputted media scores (news-sharing ideology scores) for members of the 116th Congress and for news organizations
./MOC_116_Media_Scores.csv
./News_Domain_Media_Scores.csv


COMPUTATIONAL REQUIREMENTS

R (4.2.2)
  - tidyverse (2.0.0)
  - modelsummary (1.4.2)
  - kableExtra (1.3.4)
  - cowplot (1.1.1)
  - rstan (2.21.8)
  - mediascores (0.0.0.9000)
  - overlap (0.3.4)
  - ggplot2 (3.5.0)
  - grid (4.2.2)
  - gridExtra (2.3)

The models were run on a 64-core high-performance computing cluster (with 128GB RAM), which took roughly 5-7 days to complete. On a ~6-core desktop/laptop (from 2019), the models may take 3+ weeks to run.

Using the pre-compiled models, the analysis code will take, in total, around 10 minutes or less to run.


DESCRIPTION OF PROGRAMS/CODE

  MODEL COMPILATION

  Note: Running the models from scratch will take 3+ weeks on a standard laptop. On an HPC with ~48 cores, they may take roughly 5-7 days. Pre-compiled models are included so that replicators do not need to run these models.

  ./1_Run_Models.R
  Run all of the media score models from scratch (absent the model from the YouGov data due to data privacy reasons)

  ./1_Run_Models.sh
  This shell file was used to run the models on the high-performance computer cluster at NYU. This file is not necessary if running the models from a desktop. HPCs will differ, and thus replicators by need to create a different shell file if the model is to run on an HPC.


  ANALYSIS

  Note: The analysis files can be run in any order.

  ./0_R_Libraries.R
  Installs all R libraries necessary for analysis

  ./0_Run_Replication.sh
  Runs the files 2_Analysis.R and 3_Analysis_by_Time_Period.R which produces
  all Figures and Tables available for replication in this archive.

  ./2_Analysis.R
  Computes Tables 1 & 2, Figures 1-8, and Figures A2-A6. Also computes the majority of the summary statistics in the main article.

  ./3_Analysis_by_Time_Period.R
  Computes Figure A7 and Table A2 in the Appendix, and relevant summary statistics for Appendix G.

  ./4_Analysis_YouGov.R
  Computes Figure A1 in the Appendix, and relevant summary statistics for Appendix A.

  ./ggplot_theme.R
  A ggplot2 theme that is loaded from each of the analysis R files above.


LIST OF WHAT CANNOT BE REPLICATED DUE TO CONFIDENTIAL DATA RESTRICTIONS

(1) Appendix Figure A1 (reason: privacy-protected YouGov data)
(2) Summary statistics in Appendix A. (reason: privacy-protected YouGov data)
(3) Summary statistics in the last paragraph of the "Validation" section
    (reason: privacy-protected YouGov data)


